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Abstract 



In this article we evaluate the statistical evidence that a population of students learn about 
the sub-game perfect Nash equilibrium of the centipede game via repeated play of the game. 
This is done by formulating a model in which a player's error in assessing the utility of decisions 
changes as they gain experience with the game. We first estimate parameters in a statistical 
model where the probabilities of choices of the players are given by a Quantal Response Equilib- 



rium (QRE) (McKelvey and Palfrey 1995 1996 1998), but are allowed to change with repeated 
play. This model gives a better fit to the data than similar models previously considered. How- 
ever, substantial correlation of outcomes of games having a common player suggests that a 
statistical model that captures within-subject correlation is more appropriate. Thus we then 
estimate parameters in a model which allows for within-player correlation of decisions and rates 
of learning. Through out the paper we also consider and compare the use of randomization tests 
and posterior predictive tests in the context of exploratory and confirmatory data analyses. 



Keywords: Bayesian inference, centipede game, dyadic data, game theory, interaction/relational 
data, hierarchical modeling, posterior predictive tests, quantal response equilibrium, randomization 
tests. 
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1 Introduction 



Decision making under uncertainty has long been of interest to a wide variety of academic disci- 
plines: biology, computer science, economics, mathematics, philosophy, political science, and statis- 
tics, to name a few. The main mathematical method for examining multi-agent decision theory has 
been game theory. However, the game theoretic solutions of some simple games have been called 
into question, with a classic example being the Sub-Game Perfect Nash Equilibrium (SPNE) of the 



centipede game. In experimental settings, individuals rarely choose the SPNE solution (McKelvey 



and Palfrey| |1993[ ) . To explain this, |McKelvey and Palfrey| fl!995[ |1996[ |1998[ ) suggest that players' 



strategies can be represented by a Quantal Response Equilibrium (QRE), in which players' choices 
deviate from the SPNE because of "mistakes" in decision making. The mistakes or errors may be 
due to lack of information, information overload, even the fact that human beings are not perfect 
optimizers, or as is often the case they are not optimizing according the specific criterion set out 
by researchers in a given study. 

Our examination of the data collected by McKelvey and Palfrey ( 1993 ) on the centipede game 
shows that on average players move toward the SPNE with repeated play. This idea of moving 



toward a game theoretic equilibrium through repeated play has been called learning by Fudenberg 



and Tirole (1991). An extensive amount of theoretical work has been written on the subject, as 



well as a fair amount of empirical work based on the centipede game (El-Gamal, McKelvey and 



Palfrey, 1993). We expand the QRE framework, which allows for a statistical interpretation of 



game theoretic models, by allowing the error distribution to change as players gain experience. We 
also build upon the notion of heterogeneity of players, discussed by McKelvey and Palfrey (1996) 
through the introduction of different parameters for each type of player in the game and finally 
expanding that notion to a statistical random effects model that allows for heterogeneity over all 
the subjects in the data set. The models we employed represent the data better than previous 



models based on the Bayesian Information Criteria (BIC) as a measure of adequacy (Kass and 



Raftery, 1995). The outline of the paper is as follows: In Section 2, the game and the experimental 



design are discussed. In Section 3, an exploratory data analysis is presented. In Section 4, several 
models are examined that allow the distribution of a player's error to change through experience. In 
Section 5, the model with the best BIC is developed further to allow for heterogeneity of players in 
the data through a random effects model which accounts for the correlation of outcomes involving 
a common player. The paper then ends with a discussion of model limitations and the potential 
for future investigation. 



2 The Game and Experimental Design 

The data were gathered by |McKelvey and Palfrey| ( |1993[ ) based upon the four-stage centipede 
game shown in Figure [TJ A single run of the centipede game involves two player types — Player 
A and Player B. Player A initiates the game and in the first stage has an opportunity to either 
Take or Pass. If Player A chooses Take, the game ends and Players A and B receive 40 and 10 
cents, respectively. If Player A passes, then Player B has an opportunity to either choose Take or 
Pass. Again, if Player B chooses Take the game ends and Players A and B receive 20 and 80 cents, 
respectively. At each subsequent stage the dollar amounts are doubled and switched between Player 
A and Player B. The fourth stage is the last regardless of whether Player B chooses Take or Pass. 
Based upon this pattern, there are five possible outcomes of the game (y = {1, . . . , 5}), where an 
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outcome is the total number of stages played in the game. The traditional game theoretic solution, 
the SPNE, can be determined via backwards induction; at the fourth stage, based upon utility 
maximization under the assumption that a Player's utility is determined solely by the monetary 
outcomes, it seems natural for Player B to choose Take. If the game were to reach stage 3, then 
Player A should realize this and following a similar argument would choose Take in the third stage. 
This continues backwards through the game tree, yielding the unique solution that Player A should 



choose Take at the first stage. However, this solution is Pareto inferior ( |Mas-Colell, Whinston and 
Green, 1995p , since both players would strictly benefit by moving further out in the game (stage 3 
and beyond). 
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Figure 1: M-P four-stage centipede game; T = Take, P = Pass; y denotes the outcome of the game. 

The data were collected in three different sessions, two of which consisted of 20 and 18 students 
from Pasadena Community College, and a third session of 20 students from the California Institute 
of Technology. Each subject took part in only one of the three sessions. In each session, subjects 
were randomly assigned to be either a Player A or Player B and this assignment was kept throughout 
their allotted session. In the first and third sessions, subjects played 10 games, while in the second 
session they played only 9 games. After each game, the subjects who were Players B were rotated 
so that two subjects never played each other more than once. The layout for the two 10-game 
experiments can be seen in Table [TJ The subjects who are Players A and Players B are on the 
rows and columns, respectively. The {i,j) th entry of the table indicates the game number played 
between the i th Player A and the j th Player B. Since each individual only plays 10 games the 
experimental design is a Latin square (i.e. in each row and column each game number appears 
only once). Notice that we could place the game number on the columns and fill in the table 
the Players B and we still would have a Latin square. In fact any permutation of Players A, 
Players B, and game number with rows, columns and table entries is a Latin square. Let yujUg) 
be the outcome {1, . . . , 5} for the game played between the i th Player A and j th Player B in the 
s th session, where % = {1, . . . , N(s)} and j = {1, . . . , N(s)}. Since session 2 has only 18 subjects 
N(s = 2) = 9, compared to N(s = 1,3) = 10 Thus the total number of cases in the data are 

ELl N ( s ) x N ( s ) = 102 + 9 2 + !0 2 = 281. 

Finally, In an attempt to conform to the notions of rationality required by game theoretic 
solutions, the structure of the game, number of times the game would be played, and payment 
structure were made common knowledge to all the subjects. This was done by reading a set 
of instructions, a practice session, as well as the administration and correction of a quiz. It is 
important to note that the subjects were not "taught" what an optimal strategy was in any sense. 
Additionally, the games were conducted on computers so that the subjects did not know whom they 
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Table 1: Latin square design for twenty subjects (10 Players A and 10 Players B) — the game 
number is a table entry. 



were playing against and at the end of each session, the subjects were privately paid the amount 
of money they had earned from the 9 or 10 games. Further discussion of the experimental design 
and data collection can be found in McKelvey and Palfrey (1993). 



3 Exploratory Data Analysis 

The left panel of Figure [2] presents the frequencies of the outcomes for all the games played for the 
combined sessions. The traditional game theoretic solution is for Player A to choose Take at the 
first stage. If the subjects actually played in this manner all the mass in the histogram would be 
contained on outcome 1. However, most of the mass occurs on outcomes 2 and 3. Surprisingly, we 
see some mass on outcome 5 even though we would expect a subject reaching this stage to examine 
their payoffs of $3.20 versus $1.60 and choose Take. 

Since it is clear that most subjects do not play the SPNE, a primary scientific question of 
interest is whether subjects, through repeated gaming, move toward the SPNE. In the right panel 
of Figure [2j a scatter plot of the number of games played against the five possible outcomes of the 
game is presented with a locally smooth regression [j] Due to the discrete nature of the data the 
points were jittered. The decreasing trend of the smoother suggests that on average the subjects 
move toward the first outcome with repeated play, which is the SPNE. 



3.1 Randomization Test of Trend 



The smoother suggests that the relationship between the number of games played and the outcomes 
of the games is approximately linear. The slope of the linear component of the trend was estimated 
as 7 fe s = —0.067 via least squares. Because of the potential for dependent outcomes due to 
repeated play by each subject, typical regression standard errors are inappropriate. To overcome 
this, we conducted a randomization test to examine whether the observed slope was statistically 



different from zero (Fisher, 1935 Box and Anderson, 1955; Besag and Diggle, 1977). As Besag 



and Diggle s (1977) state "a primary advantage of [randomization] testing is that the investigator 



1 The default setting of the lowessO function in the R statistical package was used. 
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Figure 2: The histogram of the five outcomes and a scatter plot of the number of games played 
against the five outcomes with a loess smoother. 



is free to use a variety of informative statistics of his own choosing, rather than be dictated to 
by known distributional theory. Indeed, even when the relevant asymptotic distribution theory is 
available, [randomization] testing provides an exact alternative for small samples." Thus in the 
randomization testing framework, which does not consider tests based on population parameters, 
our null hypothesis is Hq: the game number does not affect the outcome. In order to investigate 
this hypothesis, we consider the slope as our chosen test statistic. The test proceeds by randomly 
sampling appropriate permutations of the data y perm under Hq. Then, for each permutation 
computing 7(y perm ), and finally comparing this null distribution to the statistic calculated from 
the observed data 7(y & s )- We now provide further details on this procedure. In conducting the 
test we need to be faithful to the design, so the permutations were done according to a Latin 



square design (Cox, 1958). For each of the three sessions, the data can be represented by a Latin 



square with the Players A represented on the rows and Players B on the columns as in Table [TJ 
For each pair of players Ai and Bj, information exists on the outcome of the game that the pair 
played, as well as the current game number. The rows and columns of this matrix were permuted 
while keeping fixed the row and columns labels, and the outcome of the game for that pair. Each 
permutation shuffled the "times" at which the games were played but maintained who played each 
game and the outcome. This was done for each session. The data from the three sessions were 
placed together and the slope of the linear trend was estimated j(y perm )- One thousand values of 
liVperm) were sampled in this manner, and compared to "y(y b s )- The results of the randomization 
test are displayed in Figure|3| The approximate one sided p- value, P[y(y perrn ) < 7(y & s )|i?ci], was 
(i.e., none of the statistics from the randomization met or exceeded the observed value) suggesting 
that we reject the null hypothesis. 
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Figure 3: The approximate null distribution and observed test statistic are compared. The slope 
of number of games played against the five outcomes was used as the test statistic of interest and 
the associated p- value for the randomization test was 0. 



4 Multinomial Models 

Since the outcomes of the centipede game take on five values it is natural to model the data using a 
multinomial distribution — the key question pertains to the parameterization of the probabilities of 
each outcome occurring. The SPNE is perhaps the simplest model and states that probability of the 
first outcome is always one, P[V[ij]( a ) = 1] = 1 V s, which is clearly not a good model for these 



data. Considering this, McKelvey and Palfrey, in a series of papers (McKelvey and Palfrey, 1995 



1996, |i~998 ) relaxed this criterion through the development of the Quantal Response Equilibrium 
(QRE) model based on the work in statistical choice modeling by McFadden (1973). Their model 
still uses the decision making process to inform the specification of the probabilities of the five 
outcomes, but differs from the SPNE by allowing players to make mistakes through a stochastic 
component added to players' decisions. We expand upon this model to capture the observed mean 
trend over time which could be interpreted as learning. We also expand the model to allow for 
heterogeneity in the type of players (i.e. whether the subject is a Player A or Player B). The section 
ends with a Bayesian analysis of the model with the best BIC and confirmatory data analysis that 
will be used to motivate the random effects model of the following section. 

4.1 McKelvey and Palfrey's Original Models 

McKelvey and Palfrey| ( 1998 ) fit two different QRE models to the four-stage centipede data as 



examples of the QRE methodology which they developed (McKelvey and Palfrey, 1995 1996 



1998). We will present their one-parameter QRE model since it will serve as the basis for the models 



presented in this paper. The QRE model parameterizes the probabilities of players' decisions as 
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functions of their payoffs and the precision (or variance) of their errors. Based upon the extensive 
form of the game depicted in Figure [TJ the decision probabilities that need to be specified for each 
pair Ai, Bj within each session s are: 

g2[ ij ]( s ) = P[Player Bj chooses Take at stage 4|Player Bj reaches stage 4 when playing against Ai], 

p2ujv s -> = F[Player Ai chooses Take at stage 3|Player Ai reaches stage 3 when playing against Bj], 

Q\u,j](s) = -P[Playcr Bj chooses Take at stage 2|Player Bj reaches stage 2 when playing against A,], 

Pl[i,i](s) = -P[Player Ai chooses Take at stage 1 when playing against Bj]. 



For example, for a Player B to choose Take at the fourth stage, the perceived utility 
gained from that choice should be greater than the perceived utility gained by choosing 
Pass. The perceived utilities that drive a Players B decision are modeled as U B (y[ij^ s ) = 
4) = U B (y[i,j](s) = 4) + a(q2) [iyj](s) versus U B (y [id]{s) = 5) = U B (y [iMs) = 5) +7(? 2 )m«j 
where UB{y\i,j]{ s ) — 4) and UB{y\i,j\{s) — 5) are the monetary payoffs for the outcomes four 
and five. For the rest of this paper we will make this assumption, however the authors note 
that with larger monetary values other utility functions may be more appropriate. Finally, 
the a's and y's are random deviations that can vary across players, games, and stages within 
a single game. Therefore the probability that a Player B chooses Take is: 

Q\j](s) = P[U B (y [iJ]{s) = 4) + a{q2) M{s) > U B {y [i>m = 5) + j(q2) M{s) ) 
= P[3.20 + a(q2) M{s) > 1.60 + j(q2) [iMs) ] 

= P[ 7 (g2) M(s) - a{q2) [iMs) < 3.20 - 1.60] = P[e{q2) [i>m < 3.20 - 1.60]. 



McKelvey and Palfrey assume that the errors have a largest extreme value (lev) dis- 
tribution and that a{q2)^j^ and 7(g2)[jj]( s ) are independent leading to their subtraction 
e(q2)i ij]( s ) being a logistic di stribution — the QRE Multinomial Logit model (McFadden 
1973 |McKelvey and Palfrey 1995 [1996 , 1998). We will follow this convention throughout 
the rest of the paper for computational convenience. Based upon this distributional choice 
for the deviations, q2^^{s) can be determined explicitly — assuming the e's are distributed 
from the logistic distribution with precision A we have: 



e(q2) 



ij]( s ) ~ logistic(s/iape = 0, precision 



A), 



X _|_ e -A(3.20-1.60) ' 



McKelvey and Palfrey (1998) analyze the centipede game depicted in Figure [T] as a 



game of perfect information, thus the decision probabilities for the QRE are determined via 
backwards induction. Using an expected utility argument, p2\ it ju s ^ can be determined from 
g2[jj]( s ) and the errors in a Players A's decision. 
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P 2 [i,m = p [UA{y[i,j](s) = 3) + a(p2) [iMs) > U A {y [itj](s) = 4) + l(p2) M{s) ] 
= P[U A (y[i,j](s) = 3) + a(p2) [ i ji](s) > g2 x ^(^jKs) = 4) 

+ (1 -q2 [iJ](s) ) x ^(y^Ks) = 5) +7(p2)[y]( s )] 
= P[1.60 + a(p2) [ij1(s) > g2 [<|j]W x 0.80 + (1 - q2 [i>j]{s) ) x 6.40 + a(p2) [iMs) ] 
= P[e(p2) MW < 1.60 - g2 M(s) x 0.80 - (1 - <?2 M(s) ) x 6.40] 

1 

~~ 1 _|_ e -A(1.60-52 [ij](s) x0.80-(l-<;2 [iij]{ii) )x6.40) - 

By continuing to work backwards, the other two probabilities (gl,pl)[jj]( s ) can be deter- 
mined. Based upon the four decision probabilities the probabilities of the five outcomes of 
the game can easily be determined as follows: 



P [V[i,j]{s 
P [V[i,j](s 

p [y[ijKs 



i] 

2] 
3] 
4] 
5] 



0(U 

e {3) 

M(« 

0(4) 

[»j'](s 
0(5) 

[*j]( s 



(l -pi[ij](.))gi[ij]( 8 ), 

(1 -pl[ i!i](s) )(l - gl[ij ](s) )p2 [iii](s) , 

(i -pi[ij]( s ))(i - qhjK^i 1 ~ P 2 M(s))QMij](s), 

(1 -piliJKs))! 1 - -P 2 M(s))(l - ^[y'Ks))' 



Figure 4] shows the four different decision probabilities (q2,p2, gl^l)^-]^ plotted as a 
function ofA and in each of the four A increases toward oo, the probability of Take 

goes to 1 which is the SPNE. When A = 0, the probability between Take and Pass is 50/50 
since a player is completely uncertain about which of the two choices is best. 



4.2 A QRE Model of Learning 

In order to examine the possibility of learning within the QRE framework, we account for 
the information about the t th game played by a pair Aj and Bj through the addition of a 
covariate to McKelvey and Palfrey's base QRE model presented in Section 4.1. This is in line 



with the work of Signorino (1999) where additional information about a game or the subjects 



involved allows researchers to gain an understanding of how variation in covariates leads to 
variation in outcomes of the game. Figures [2] and [3] suggested a decrease in the outcome of 
the game as the number of games played by an individual increased. We consider modeling 
this by allowing the magnitude of the precision parameter to change over repeated play - 
leading to the following QRE parameterization: 



(e(pl), e(gl), e(p2), e(g2))[jj]( s ) ~ logistic(s/iape = 0, precision = A& ). 

In McKelvey and Palfrey ( |1998 ), the authors had hoped that modeling heterogeneity, in 
terms of player type (A or B), would lead to a significant improvement in the fit of their 
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Figure 4: The probability of choosing Take at any point in the game goes toward 1 as the precision 
parameter A increases. 
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models but they did not explore this possibility. With this consideration, we expand our 
model by allowing for such differences: 

(e(pl), e(gl), e(p2), e(g2))[jj]( s ) ~ logistic(s/iape = 0, precision = Ape' 3 *), 

p = player type G {A, B}. 

This leads to the complete statistical specification of the model as follows: 

y[i,j]( s ) ~ multinomial((6 |(1) , . . . , 9^)[ij]( s )), 
[yj( s ) are determined by the game tree in Figure [l] and 
the following QRE specification: 
(e(pl), e(ql), e(p2), e(q2))[ij}( s ) ~ logistic (shape = 0, precision = A p e^*), 

p = player type G {A, B}. (1) 
Based upon the QRE model the likelihood of the observed data is: 

N(s) N(s) 3 



i=l j=l s=l 



The first two models in Table [2] are the results from fitting the two learning models 
one with a common A and the other with heterogeneity among player types. The next 
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two models (3 and 4) are for comparison and are discussed in McKelvey and Palfrey (1993 
1998). The models consist of the one-parameter QRE model and a two parameter model 



which assumes "that there is some small probability that players are 'altruistic' (and hence 
choose [Pass] at every opportunity)". Finally, we also fit a standard ordered multinomial 
probit model which does not consider the underlying decision making process. All the models 
were fit by maximum likelihood estimation for an expeditious estimation of the models and 
model comparisons via BIC. Using the BIC as a measure of fit, Model 2 appears to represent 
the data better than the other models. For this reason it was investigated further using a 
Bayesian approach in order to obtain credible intervals and examine goodness-of-fit statistics 
through the use of the posterior predictive tests. Additionally, since this model will be further 
expanded by utilizing random effects, the move to Bayesian inference at this point is natural. 



Number 


Model 


Parameters 


LL* 


BIC 


1 


Slope in the variance 


\P 


-417.81 


-846.898 


2 


Slope in the variance with heterogeneity 


Aa, As, /3 


-380.195 


-777.31* 


3 


M & P's original model 


A 


-424.91 


-855.47 


4 


M & P's altruistic model 


A, q 


-402.5 


-782.55 


5 


Ordered Multinomial Probit 


Ql, Ol2, Oiz, 014, P 


-376.928 


-782.048 



Table 2: The table presents 5 different models that were initially investigated. Based on the BIC, 
Model 2 was investigated further. 



The Bayesian analysis for Model 2 based on Equations 4J2 was conducted with the fol- 
lowing diffuse priors: 

log(\A),log(\B) ~ normal(mean = 0, variance = 100), 
(3 ~ normal(mean = 0, variance = 100). 
The resulting posterior distribution is: 

tt(A a , X b , p\y) oc L(X A , X B , (3\y) x P{X A ) x P(X B ) x P(fi). 

The Bayesian estimation for the model was conducted using the Metropolis algorithm. 
Each of the three parameters were updated separately. A total of 500,000 iterations of 
the Metropolis algorithm were conducted and the first 20,000 were removed for burn-in. 
The remaining iterations were thinned by sampling every 25th iteration resulting in 20,000 
sampled values of Xa,Xb, and (3 from the posterior distribution — diagnostics suggested 
convergence to the posterior distribution. 

Figure [5] present the densities of the posterior distributions for the three model parame- 
ters. The main scientific question of interest depends upon the marginal posterior distribu- 
tion for j3. Since the empirical P(/3 > 0) = 1, as the number of games played increases so 
does the precision. We are interpreting this increase in the precision (or decrease in variance) 
as statistical learning, which can also be considered as learning in the game theoretic sense 
since increasing the precision leads to the SPNE for the QRE model specified by Equations 



4.2 Also, it can be seen in the figure that X B < Xa, in fact the empirical P(X B < Xa) = 1, 
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suggesting that the Players A have higher base precision. It is important to note that Xa 
and Xb not only represent each player's base precision about their utilities, but also repre- 
sent each player's estimates about the precision of the other type of player. Thus, Xb < Xa 
suggests that both populations of A's and B's "estimate" that Players A are more "certain" 
(i.e. have a higher base level of precision) in their choices through out the game. This matter 
will be discussed further in Section 5. 




2 3 4 5 0.8 1.0 1.2 1.4 0.00 0.04 0.08 

Figure 5: The marginal posterior distributions for Xa,Xb, and (3. The posterior means are 3.275, 
1.082, and 0.034. 



The notion of learning can be seen more explicitly in Figure |6j which plots the probability 
of the 5 different outcomes in relation to the game number (based upon the means of the 
posterior distributions of the parameters). As the number of games is extrapolated to 100, 
the probability of the first outcome P(y = 1) goes to 1. 



In order to check the model fit, we conducted four posterior predictive tests (Gelman 



et al. , 1997). Each test consists of generating the posterior distribution of a test statistic of 
interest T(y \A) based on replicate data sets generated from the model y \A and then 
comparing that distribution to the test statistic computed from the observed data T(y obs ). 
Here, A represents the posteriors of the model parameters. In particular, 20,000 replicate 
data sets y rep were generated from the joint posterior distribution of the parameters based 
on the 20,000 MCMC scans. These tests are a way to see if the model is capturing features 
of interest in the observed data. A way to quantify the notion of "capturing" is through 
Bayesian p-values based on a particular test statistic: P[\T(y rep \A)\ > T(y obs )]. A small 
p- value suggests that the model is not capturing the statistic of interest. For this data, we 
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Figure 6: Probability of the 5 outcomes vs the game number based upon the mean of the posterior 
distribution of the parameters from the QRE model. The vertical line represents the end of the 
observed data. 



are concerned with examining whether the model captures: 

1. the trend of the outcomes against the game numbers; 

2. potential differences among the Players A; 

3. potential differences between the Players B; 

4. potential differences between the sessions. 

For (f), in order to make a comparison to the randomization test conducted in Section 
3.1, the test statistic used was the slope of the linear trend of the outcomes versus the 
number of games played. For (2)-(4), the variance between the Players A, Players B, or 
sessions was compared to the variance within each of these groups via an F-statistic. Here 
we use the term 'potential differences' since our model did not account for differences in each 
of those groups, which a priori may be an adequate assumption. Before we state the four 
test statistics explicitly some additional notation is necessary: 

_ „, „-7 _ 2^i=i VMM n, — hi=i ^b=iVMM 

V[;j](s) ~ 2^=1 V[iJ](s), V[;j](s) ~ W(s) ' ^M(') ~ T, 3 a=1 N{s) ' 

y[;-](s) — £ui=l Z^j=l 4/[i,i](s)> ,•](*) ~~ N(s)xN(s) ' «[•>•](•) ~~ ELi Ar ( s ) x7V ( s ) 
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In particular the four test statistics we considered are: 

1 = E^ ( r } Efii s) ELi[( f i»,j]( S )- f i-,-](-))fa[» J ]( 3 )-^, ■](■))] . 
E*=i E j= i E 3 =i(*[ij]( 3 )-*h ■](■)) 

9 p eS } Efi^ £L(»[vi(.)-»[y](.)) a /(£Li gfrM) _ MS PlayersA , 

Players A ES 5 E]2? EU(»[ij]W-S[i,](0) a /(E3=i ^xivw-EL! ^ ~ 

o p EfJr ) E7 = ( i a) ELlfa-[., J ](.)-5[.,.](.)) 2 /(ELl^(«)-l) _ MgpiayersB 

Players B ES? E]2? EL^rt J] c)) 2 /(EL 1 "Mx^M-EU " " ' ' 
4 p Efir EfJi' ELi(g[-,-]w-ff[-.-](-)) 2 /(3-i) 



sevens E f = W E £W Ef =i G/[i,j] ( s ) -£[• , •] « ) 2 / (Es=l N(s)xN(s)-3) ~ MS c: 



The second row of Figure [7] presents the results for the posterior predictive tests (PP). 
The first panel in that row depicts the posterior distribution of the slope j(y \A) while 
the vertical line represents the test statistic from the observed data 7(y & s ). The one sided 
Bayesian p-value for this posterior predictive test is P[y(y rep \A) < 7(y o6s )] = 0.421, which 
suggests that model is capturing the slope for the linear trend. In comparison, the Bayesian 
p-values for the tests examining the differences among the Players A and Players B are 
both zero, suggesting that the model could be expanded to allow for differences among the 
subjects within each player type. Finally, the last panel suggests that incorporating a session 
effect into the model may not be necessary since P[F sessions (7/ r , ep | A) > F SCSS i ons (y obs )} = 0.056, 
considering 0.05 a cut-off value. 

In comparison, the top row of Figure [7] presents the results for a set of randomization 
tests. The first panel replicates the results from Section 3.1 and allows for a comparison 
between the randomization test and the posterior predictive test when examining the trend. 
Recall that the hypothesis for the randomization test was H : the game number does not 
affect the outcome. From this randomization test, we concluded that we could reject the 
null hypothesis. Now through modeling the trend, the posterior regard test suggests that 
our model is capturing that trend. This approach allows one to initially investigate a set 
of hypotheses of interest via the randomization testing approach and then using the same 
test statistics compare the results based upon a particular model using posterior predictive 
tests. In regard to randomization tests and hypothesis testing in general [Besag and Digglejs 



(1977) state "we contend that significance testing is rarely to be treated as an end in itself, 
its purpose being more usually as an aid in suggesting further hypotheses relevant data 
collection". Or in this preliminary tool for exploratory data analysis which can 

lead to further modeling. The next two panels in the top row examine the following two null 
hypothesis: 

1. Hq: there are no differences among the Players A; 

2. Hq: there are no differences among the Players B. 

The following procedure was used for the randomization tests: 1.) within each session, 
a random permutation of the Latin square, with either Players A in the center or Players B 
in the table entries, was conducted under the null hypotheses of no difference; 2.) the three 
sessions were combined and the appropriate F-statistic was computed. 3.) This procedure 
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was repeated 1,000 times leading to the null distributions displayed in Figure [7} From the 
histograms labeled 'Players A (R)' and 'Players B (R)' it is clear that the p- values are 
zero, and we should reject the null hypotheses. These results coincide with the posterior 
predictive tests, which were based on a model which did not account for differences among 
the subjects. Finally, we are unable to conduct a randomization test for differences among the 
three sessions which is faithful to the design. Since different subjects are nested within each 
session and each session has Latin square design, we cannot simply permute a subject between 
sessions without destroying the Latin square design. We could conduct a randomization test 
which ignored the design, these are typically called 'unrestricted' randomization tests since 



each subject can be allocated to any treatment combination however as Garthwaite, Jolliffe 



and Jones (1995) note "[t]his has disadvantages for testing whether one factor affected the 
responses, since the influence of other factors may bias the results" . Since we have a model 
at this point, we will rely on a posterior predictive test to examine the question of differences 
among sessions and forgo the unrestricted randomization test. 
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Figure 7: The top row are results from the randomization tests (R) and the bottom row, are results 
from the posterior predictive tests (PP). In each column the same test statistics were used. 
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5 A QRE Random Effects Model of Learning 



All of the previous models have assumed no subject specific effect for the Players A or the 
Players B; however, repeated observations from the same subject could result in statistical 
dependence of the outcomes which are assumed to be independent in the previous QRE 
models. As was noted from both the randomization tests as well as the posterior predictive 
tests there does appear to be substantial differences between the subjects (either Players 
A or Players B). In order to account for this correlation, a random effects model was em- 
ployed. A QRE random effects model is easily developed through a Bayesian hierarchical 
approach. Now, each subject has their own set of parameters which come from a population 
of parameters for each player types — A^, Pa^ \b , and (3b.. 

Another important consideration is that the experimental design is such that players do 
not know whom they are playing against. The QRE however assumes that every player 
knows every other player's error distribution. In the simple case, where we modeled two 
player types (Players A and Players B) and did not consider individual subject effects. This 
probabilistic model induces relationships between Xa, Xb and the probabilities of choosing 
Take at various stages of the game (q2,p2,ql,pl)^ i ju s ^ that are not as simple as in the one 
parameter model depicted in Figure |4j To clarify the point, Figure [8] demonstrates the 
relationship between A^, A#, and pi (probability that Player A will choose Take at the first 



stage of the game), based on the model defined by Equations 4.2 where with out loss of 
generality, we take /3 = 0. The figure shows that simple statements can not be made about 
pi as Xa increases. The probability for Player A to choose Take at stage 1 depends upon 
the specification of Xb, and could go to 1 or to (away from the SPNE) as A^ increases. 
The reason for this lies in the assumptions about the QRE model, in that the players' are 
assumed to know everyone else's error distribution. Thus, if Xb is small, suggesting that 
Player B is equally indifferent between choosing Take or Pass, then Player A will maximize 



her expected utility by choosing Pass as A^ increases. This is the point that McKelvey and 



Palfrey (1996) make in their example about chess players: "[Consider] a chess game between 
an expert and a beginner. If this were common knowledge, then the expert might adopt a 
different strategy than she would against another expert." 

When we only model player types, the assumption that that subjects know the parameters 
may not seem unreasonable, but when we move to a random effects model where each subject 
has there own set of parameters, then this assumption of knowledge about the other players 
is far too strong. This is especially important considering that the subjects did not know 
whom they were playing against. To return to a slightly less restrictive assumption, we 
assume that subjects may not know the distribution of the subjects they are playing against 
but the empirical means of parameters associated with the opposing player type. Thus each 
choice is a probabilistic function of subject specific A and (5 parameters, empirical means of 
the A and j3 parameters of the opposing player type, and the current game number: 

Player As' decisions: 
P(Take) Ai =F At (\ A .,p Ai ,\ B J B ,ty, i e {1,...,29}. 
Player Bs' decisions: 
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Figure 8: The figure shows that the probability that for a Player A to choose take at the first stage 
of the game (pi) depends not only on Player A's precision parameter X A but also on the precision 
parameter Ab of Player B. 

P(Take) B .=F B .(X B .,0 B .,X A J A ,t); j e {1,...,29}. (2) 

The above can further be clarified as follows. The probability that Player Bj will choose 
Take at stage 4 of the game (q2) when playing against Player Ai is given by: 

<?%j](s) = logit [\ Bj exp(f3 Bj t [i:j]{s) )(3.20 - 1.60)] . 

Now the probability that Player A,i will choose Take at stage 3 of the game (p2) when 
playing against Player Bj depends on empirical means of the Players B: 

p2 M{s) = logit [\ Ai exp((3 Ai t [iMs) )(\m - g2 [iJ](s) 0.80 - (1 - g"2 [iJ](s) )6.40)] , 
92[ tj1 ( s ) = logit [\ Bj exp(/3 Bj t [idKs) )(3.20 - 1.60)] . 

Next the probability that Player Bj will choose Take at stage 2 of the game (gl) when 
playing against Player Ai depends on the empirical means related to the Players A. Note that 
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in Equation 4, q2uj]i s \ is known to Player Bj when determining qluju s ) as long as it is not 
embedded in p2r i j\f s \ which is the case in Equation 5. Thus subjects are always assumed to 
know their own probabilities as long as they are not embedded in another subject's decision 
probabilities. We feel this is a reasonable assumption and is much preferred to subjects 
knowing exactly the parameters of their opponents, however it is an area of inquiry which 
could be inspected further. 



Qh,j](s) = [^B 3 exp(f3 B] t[i d]{s) )(0.80 - p2 [id]{s) 0A0- 

(1 - P 2 M(s) )g2 M(s) 3.20 - (1 - p\ j]{s) )(l - g2 M(s) )1.60)] , (3) 

P 2 [i,j](s) = logit[X Ai exp(^ Ai t [iij ]( s ))(im-q2 [iij]{s) 0.80- 

(1 - q\ j](s) )6A0)] . (4) 



Based on these considerations, we can fully define the QRE random effects model. Here 
again we transform the precision terms Aa 4 , X Bj by modeling 5^ = log(XA i ) 1 5b. = log(\sj), 
which creates a random slope and intercept form of the model. A priori we assume the 
the sampling distributions for the intercepts and slopes for each subject are not correlated, 



which appears to be justified by the scatter plots in Figure 11 since no strong linear pattern 



is present. Finally, the priors were chosen to be conjugate but diffuse. Altogether the model 



is: 



VM(s) ~ multinomial(0g] ](s) , . . . , 6>g] ](s) ). 

^[i]}( s ) , ' ' ' ' ®[i]]( s ) are determined by the game tree in Figure [T] and 
the following QRE specifications: 

Jog(A Ai )+p Ai t _ c&a^Pa, 
log{\ B .)+p B .t _ S B .+p B 



Players A: 


(e(pl) 


e (P 2 ))[iJ](s) 


~ logistic (shape 


= 0, precision = 




Players B: 






~ logistic(s/iape 


= 0, precision = 










normal (mean = 


Us A , variance = 


°u 








normal (mean = 


Us B , variance = 


<\ 








normal(mecm = 


Hp A , variance = 










normal (mean = 


p,p B , variance = 





Hs A , p,s B , p*p A , p*p B ~ normal(mean = 0, variance = 100), 

°s A > a s B > a l A > a \ B ~ inverse-gamma (1,1). ( 5 ) 

The estimation of the model parameters was through the construction of a Markov chain. 
We found the mixing of the parameters related to the Players B to be very slow, so we 
conducted a total of 20 million scans, of which the first 5 million were dropped for burn-in. 
We thinned the remaining scans by taking every 1,000th, which left us with 15,000 samples 
from the posterior distribution. Again the chains were inspected for convergence. 
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The main parameters of interest are the population means and variances of 5a^ Sb v Pa,, 
and Pb 3 whose posterior distributions can be seen in Figure |9j The /Vs represent a base 
level of precision for the population of Players A and Players B. Again we see that, on 
average, the Players A have a higher base precision, but their base precision is more variable 
compared to the Players B. The figure also presents the posterior distributions of the means 
and variances of the Pa/s and Pb/ s - The medians of the posterior distributions of fip A and 
fip B are greater than zero. While the 95% credible intervals contain zero in both cases, 95% 
and 51% of the population A and population B, respectively will have a mean for p that is 
greater than zero. Thus for that proportion of the population, as the game number increases 
so will the precision, which we interpret as learning in both the statistical and game-theoretic 
senses. Again there does appear to be greater variation among the Players A compared to 
the Players B in regards to /3's, but this is not nearly as great as was seen for the <5's. 
Additionally, we can also compare the distribution of /3's for each subject by examining the 



95% credible intervals in Figure 10 The dots and triangles in the figure are the medians. 
The dots represent subjects where the probability that P for that subject is greater than 
is between 50% and 75%. The triangles represent subjects where the probability that P for 
that subject is greater than is between 75% and 100%. For the Players A, 22 out of 29 have 
medians greater than zero, compared to Players B where it is only 13 out of 29 have medians 
greater than zero. Finally, the plot in Figure [IT] depicts a scatter plot of the medians for 
each subject's 8 and P grouped again by player type. The lack of linearity, or more precisely 
sphericity, justifies our model assumptions of no correlation between the slope and intercept 
for each subject. 

As before, it is important to check the fit of the model compared to the data using the 
posterior predictive tests. The same set of test statistics were employed and the results can 



be seen in Figure [12} The first and second rows are the randomization tests and posterior 
predictive tests from Figure [7| The last row presents the posterior predictive tests for the 
random effects model (PP-RE). The p- values associated with these tests are 0.272, 0.057, 
0, and 0.118, respectively. From these results, it appears that we are capturing the features 
related to the trend and differences between the Players A. While there is an extremely slight 
shift in the histograms examining the differences between the Players B between the models 
with and without random effects, this is a feature which the model still does not appropriately 
capture. Since some Players B do pass at the last stage of the game, incorporating some 



version of the 'altruistic' model suggested by McKelvey and Palfrey (1998) may lead to a 



better fit. A two component mixture model for the random effects associated with the Players 
B may pick up on this notion of altruism. However, with only 29 subjects the estimation 
may really heavily on the priors, since potentially only a few individuals would end up in 
the 'altruistic' group. Finally, since we allow for differences for each subject through the 
random effects, we would expect to be able to also capture potential differences between the 
sessions. 
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Figure 9: Posterior distributions of population parameters. 

6 Conclusion 

In this article we have examined expansions of the QRE model to allow for game-theoretic 
learning via repeated play of a game. This was done by allowing the precision of players' 
error distributions to change as players gain experience with the game. In analyzing a 
data set on repeated plays of the two player centipede game, it was found that such a 
model fits better than the standard QRE model. The model was also expanded to allow for 
heterogeneity across experimental subjects by introducing random effects terms to capture 
variability in how players learn. This requires a modification of the QRE formulation, as it 
is unlikely that players know how fast their fellow game-players are learning (at least in the 
data sets considered in this paper). Instead, we assume each player makes a "best guess" 
at their fellow players behavior, based on population averages of the parameters describing 
behavior. Additionally, we employed both randomization tests for exploratory data analyses 
and posterior predictive tests for both exploratory and confirmatory data analyses. 

Several extensions to the approaches in this paper can be taken. We assumed in the 
random effects model that player's are always assumed to know their own probabilities as 
long as they are not embedded in the other player's decision probabilities, but we could 
globally allow them to know their own probabilities. As mentioned, it would be interesting 
to explore mixture models with larger data sets or simulated data. Finally, another issue 
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Figure 10: 95% credible intervals /3's for each subject. The dots and triangles in the figure are the 
medians and represent the associated probability that (3 for that subject is greater than zero. 



that needs to be addressed is the one of the distribution of player errors. QRE models, 
as well as, statistical choice models in general are constrained by the type of player error 



distributions that are employed, being either largest extreme value of Gaussian. Quinn and 



Westveld (2009) have utilized a semi-parametric approach to solve this problem within the 



QRE framework. This method could also be applied with the QRE random effects model 
to allow for greater flexibility in the modeling and testing of learning — again a larger data 
set would be needed so that priors are not completely dominating the results. 

More generally, the modeling approach in this paper can be seen as an attempt to inform 
a statistical analysis of a complicated data set with an underlying behavioral model, or 
conversely, expand upon a behavioral model to allow for a more accurate description of 
observed data by incorporating certain characteristics of natural variability. Approaches such 
as this may help bridge the gap between the purely statistical analyses of social relations 



data. 
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Scatter plots of the medians of each subjects slope and intercept for Players A and 
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